Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix uns merge 3d #1302

Merged
merged 7 commits into from
Jan 15, 2024
Merged

Fix uns merge 3d #1302

merged 7 commits into from
Jan 15, 2024

Conversation

ivirshup
Copy link
Member

@ivirshup ivirshup commented Jan 12, 2024

This was a little harder than expected due to our need to handle non-numeric dtypes. We were using dataframe comparisons to handle that.

Now we are using numpy when able, but switching to dataframes constructed from reshaped arrays when we can't.

Copy link

codecov bot commented Jan 12, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (a8f0b76) 0.00% compared to head (6e6c55f) 85.70%.
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##           main    #1302       +/-   ##
=========================================
+ Coverage      0   85.70%   +85.70%     
=========================================
  Files         0       34       +34     
  Lines         0     5450     +5450     
=========================================
+ Hits          0     4671     +4671     
- Misses        0      779      +779     
Flag Coverage Δ
gpu-tests 52.03% <83.33%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
anndata/_core/merge.py 94.36% <100.00%> (ø)
anndata/tests/helpers.py 96.14% <100.00%> (ø)

... and 32 files with indirect coverage changes



def gen_3d_recarray(_):
# Ignoring n as it can get quite slow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean exactly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which part?

recarray refers to a record array: https://numpy.org/doc/stable/reference/generated/numpy.recarray.html#numpy-recarray

n is a parameter passed to all the other functions here. Generally it changes the size of the result, but we ignore that in a couple cases. Here I ignore it because it made some test cases take over a minute.

anndata/_core/merge.py Outdated Show resolved Hide resolved
anndata/_core/merge.py Outdated Show resolved Hide resolved
@ivirshup ivirshup requested a review from ilan-gold January 15, 2024 10:25
if a.shape != b.shape:
return False

return equal(pd.DataFrame(a.reshape(-1)), pd.DataFrame(b.reshape(-1)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does _.reshape(-1) mean? The numpy docs say

One shape dimension can be -1. In this case, the value is inferred from the length of the array and remaining dimensions.

So what you do basically creates 1D pd.DataFrames? Shouldn’t we make and compare pd.Series then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would make sense, but in practice the Series constructor acts differently and throws errors for some of the non-numeric dtypes.

@ivirshup ivirshup merged commit 0fa245d into scverse:main Jan 15, 2024
14 checks passed
meeseeksmachine pushed a commit to meeseeksmachine/anndata that referenced this pull request Jan 15, 2024
ivirshup added a commit that referenced this pull request Jan 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Checking array equality during merging doesn't work for >2d
3 participants